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ABSTRACT 
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each subject is considered. The current paper discusses how best to 
capture a possible dependence of the effect of the within-subject 
factor on the level of the covariate. Procedures originally 
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regression in the between-sub jects case are extended to this repeated 
measures situation. We conclude that such pick-a-point and 
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overall tests of the within-subject effects but also permit a 
thorough analysis of the attribute-by-treatment interaction implied 
by a significant regression of the within effect on the covariate. 
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Abstract 



The data analysis problem posed by a repeated measures design that 
includes a single observation on a covariate for each subject is con- 
sidered. The current paper discusses how best to capture a possible 
dependence of the effect of the within-subject factor on the level of 
the covariate. Procedures originally explicated by Rogosa (1980) for 
dealing with heterogeneity of regression in the between- subjects case 
are extended to this repeated measures situation. We conclude that such 
pick-a-point and simultaneous inferential procedures not only provide 
more powerful overall tests of the within-subject effects but also per- 
mit a thorough analysis of the attribute-by- treatment interaction 
implied by a significant regression of the within effect on the covari- 
ate. 
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ANOOVA and Repeated Measures: Dealing with Heterogeneity of Regression 

Since almost by definition educational researchers are interested 
in examining progress , it is not surprising that repeated measures 
designs are among the most commonly used in educational research. In 
addition, individual differences are frequently a component if not the 
focus of educational studies. It is therefore important that research- 
ers be facile at analyzing designs that incorporate both repeated meas- 
ures and individual difference variables. Unfortunately, this is an 
area in vtfiich there has been confusion and controversy regarding 
appropriate methods of analysis. 

Although it is well known that the repeated measures design itself 
"controls" for individual differences in level of performance on the 
dependent variable, it has not generally been viewed as a vehicle for 
specifically examining individual differences. It is possible however 
to combine a repeated measures analysis with an analysis of oovariance 
as a ireans of incorporating individual difference variables. Recent 
papers on the subject by Ceurvorst and Stock (1978) , Delaney and Maxwell 
(1981) , and Algina (1982) have clarified some of the issues involved in 
using such an approach. But these papers have not made clear how best 
to capture the possible dependence of the effect of the repeated meas- 
ures factor on the characteristics of the individual taking part in the 
study. Thus, our principal concern in the current paper is with 
describing how one can most effectively utilize the information avail- 
able vrtien an individual difference variable or oovariate has been 
included in a repeated measures design. 

Naturally, the conclusion one draws about what analysis is most 
appropriate depends upon the assumptions made at the beginning about the 



Delaney & Maxwell 



-2- 



Heterogeneity 



structure of the problem. In the traditional univariate approach to 
analyzing repeated measures designs (Keppel, 1982, p. 367 ff) , in addi- 
tion to the restrictive assumptions made about interrelationships among 
the dependent variables, it is also typically assumed that the slope of 
the regression of the dependent measure on the covariate is the same for 
all measures (cf . Ceurvorst and Stock, 1978) . Thus when computing 
"adjusted effects" the same adjustment would be made on each of the 
dependent measures, and in contrasts assessing the within-subject effect 
such adjustments would drop out entirely. 

An advantage of the multivariate approach to repeated measures 
designs (e.g. McCall & Appelbaum, 1974) is that the interrelationships 
among the dependent variables as well as between each dependent variable 
and the covariate are not constrained. One implication of this assump- 
tion is that it is possible to make adjustments, via analysis of covari- 
ance ,(ANOOVA) , of effects involving the wi thin-subjects factor (Delaney 
& Maxwell, 1981) . Such effects are assessed in the multivariate 
approach by forming combinations of the dependent variables to represent 
contrasts of interest in the levels of the within-subject factor (s) . 

Testing whether the grand mean(s) of the new variable (s) equals 
zero assesses the main effect of the within-subject factor. Further, a 
covariate can be included in the model to remove variability in the 
trend variable (s) predictable by these previously observed individual 
differences among subjects. Delaney and Maxwell (1981) showed that, 
under the typical ANCOVA assumptions, one can thereby achieve a valid 
test of the wi thin-subjects factor that will generally be more powerful 
than the unadjusted test. However, it was noted that the proportional 
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reduction in error variance will usually not be as great as in the 
between-subjects design, in part because of the unreliability cannon to 
difference scores. In order to correct an erroneous example of this 
type of analysis offered by Ceurvorst and Stock (1978) , the technical 
point was also noted that the covariate needed to be "centered" or 
expressed in deviation score form in order for the estimate of the 
intercept in the ANOOvA situation to correspond to the estimate of the 
grand mean in the unadjusted test. 

In our previous paper we mentioned the fact that the regression of 
the trend variable on the covariate could be viewed as an indication of 
an attribute-by- treatment interaction (ATI) . However, we did not 
develop this point in detail, and now see the need for procedures to 
specify the nature of the ATI. Fortunately, a set of analytic pro- 
cedures developed for dealing with heterogeneity of regression in 
between-groups designs can be fruitfully extended to the repeated meas- 
ures case. It is the primary purpose of the current paper to detail how 
this can be done. The paper will begin with consideration of the sim- 
plest possible wi thin-subjects design with a covariate, and then deal 
with more complex designs. 

Throughout the paper we will be making the assumption typically 
made in ANOOVA that the covariate, X, is fixed. This does not mean that 
the values of X must be specified by the experimenter in advance, but 
rather that the inferences are made to subpopulations of subjects having 
the same values on X as those observed. If this assumption is not made, 
then certain of the parameter estimates of interest will be less precise 
(see Algina, 1982) . The only distributional assumptions we require are 
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that the residual errors associated with the trend variable be indepen- 
dently and identically distributed as normal random variables, or in the 
case of multiple trend variables, that they jointly follow a multivari- 
ate normal distribution. 

Totally Within Design 

Consider first the simplest possible repeated measurement design 
with a covariate. Assume a single score is available for each of a 
group of subjects on a oovariate, X, as well as on each of two dependent 
variables, Y 1 and Y 2 . Y 1 and Y 2 must at least be commensurate and will 
typically be scores on the same conceptual variable assessed at two dif- 
ferent points in time. For example, an investigator might assume a 
linear relationship between a child's age (X) and performance on a 
problem-solving task, with problem-solving being assessed before and 
after instruction to yield two scores, Y x and Y 2 - The primary questions 
of interest in such a design would likely be whether there is an effect 
of instruction (and/or practice) and whether this effect depends on the 
subject's age. That is, is there a main effect of instruction, and is 
there an interaction of instruction with age? 

One might view the problem as involving the comparison of two 
regression lines, that of Y 2 on X and that of Y.^ on X. Denote these 
regression models as follows: 

Y 2 = a 2 + b 2 x + e 2 ; 

Y i = ^ + 'V* + e i* ^ 

wnere we use a lower case x as a reminder that the covariate should be 
expressed in deviation s>core form. Here of course the subscript 
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designates the dependent variable; were it to designate groups of sub- 
jects then a test of the interaction of age with instructions would 
correspond to the test of heterogeneity of regression in a between- 
groups ANOOVA. In the present context however since the same subjects 
serve in both conditions the errors, e^ and will be correlated and 
any test must take this into account. The appropriate correction is 
easily accomplished by simply computing differences between the observa- 
tions in the two conditions, i.e. Y Q = Y2 - Y^, and using this as the 
dependent variable. Use of such a variable is prototypical of the "mul- 
tivariate approach" to repeated measures (McCall & Appelbaum, 1974) . 
Then wa have the single regression model 

Note that = b^ - b^ and ^ = a 2 Further, since X is such that 

x = 0, = Y D = Y 2 - Y lm Since le* = (1 - ) S(Y D - 2 , one can 

perform a test of the within-subject main effect (by testing for whether 
is equal to zero) that will likely have greater power than would the 
unadjusted test. Admittedly, because of the unreliability of difference 
scores and the fact that typically oovariates will predict even true 
change considerably less wall than final level of perfomance, the gain 
in power resulting from using the oovariate will typically be less than 
in between- subject designs with the same variables. Nonetheless, there 
are conditions where , and hence the gain in power in the within- 

subjects analysis, will be substantial (Delaney & Maxwell, 1981) . 

Two points about such a test are deserving of comment. First, the 
test involves a conditional inference. Statistical inferences are 
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restricted to subpopulations having the particular same values of X. 
The test just described which concerned the mean of the subpopulation of 
Y D values having an X score equal to X is illustrative of such a condi- 
tional inference. This test of the significance of the change for a 
typical individual will usually be of most interest. However, as Rogosa 
(1980) has made clear one could w pick-a-point" other than X at which to 
perform the significance test. For some investigators this flexibility 
will not be an attraction - simply the benefit of increased power of 
testing the within-subject factor will more than offset the cost of res- 
tricting the inference to a conditional one. This would almost cer- 
tainly be the case in situations like the example given above where the 
ages of the children would in fact likely be chosen in advance, and thus 
making an inference conditional upon those ages would be the intent of 
the investigator. 

The second point concerns an issue about which varying opinions 
have been expressed by methodologists. We have said that the regression 
of the difference score on the covariate, e.g., as indicated by a test 
of in (2) above, needs to be substantial in order for there to be 

substantial benefit from using the covariate. However, a substantial 
regression here means that there is violation of "the assumption of 
homogeneity of regression" in that it indicates a difference between fc^ 
and b 2 . Some have argued in this context, e.g., Algina (1982), that the 
unadjusted test of the within main effect is preferable to the adjusted 
test, in part because the latter goes against the principle that the 
main effect of a factor is meaningful only when it does not interact 
with other factors. While a world with no interactions would be simpler 
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(and duller) , ignoring them when they exist is not the optimal strategy. 
We would opt instead for conducting conditional tests in this situation , 
and agree with Rogosa (1980 , p. 308) that one can meaningfully interpret 
the average distance between two regression -ines even when the slopes 
differ . 

Following Rogosa" s notation, let the difference between the popula- 
tion regression lines at a specific value of X be denoted A( x ^) which 
may be written 

A(x.) = (a 2 - a x ) + (b 2 - b L )x 1 - + bpX. 
and vtfiich is estimated by 

d ( x.) =a D + Vr 

Denoting the residual variance in model (2) as 

where the numerical subscripts refer to the original variables and 
Y 2 , then the variance of the sampling distribution of D(x i ) may be 
expressed here as 



x 2 

» i? 



(3) 



and may be estimated simply by substituting for the sample mean 
square error from model (2) , i.e. 

^ = S = TT^-2 = H=2 

Since (3) will be a minimum when x is at its mean of 0, one can inter- 
pret the test of the intercept of model (2), as being an evaluation of 
the difference between the two regression lines in (1) at the point 
where the estimate of that difference has the greatest precision. 

erJc 10 
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Although one could perform and meaningfully interpret such a test 
as one of the treatment effect for the average individual, typically one 
would want to pursue further the nature of the attribute-by-treatment 
interaction implied by the regression of Y D on x. Our earlier discus- 
sion of this general problem (Delaney & Maxwell, 1981) did not give suf- 
ficient attention to this point. One would like to perform the 
equivalent of tests of the simple main effects of the treatment that are 
frequently employed as follow-up tests of interactions in completely 
crossed factorial designs. This could be done here by constructing con- 
fidence intervals around the conditional means, £L i x , for the X values 

D 1 i 

observed in the study. The 1 - c( interval for this mean is bounded by 

Thus, analogously to the nonsimultaneous region of significance typi- 
cally used in Johnson-Neyman analyses of ATI's in between-subjects 
designs, one can define a region composed of points on the X-axis for 
which a Y D value of zero is outside the confidence interval at that x^. 
This would allow one to make statements about one's confidence of the 
presence of the treatment effect for a particular value of X. Note that 
because of the dependence of c^ (x) on how far the particular X score is 
from X it is possible that estimated values of »v\ x ^ ich ^ e larger in 
absolute value than a significant ^ may be judged nonsignificant 
because of the widening confidence bands as you move away from X. 

Alternatively, one could opt for a simultaneous inferential pro- 
cedure which would allow one to make statements about the reliability of 
the difference between ^ j x and 0 for a whole set of X values. The 
Working-Hotelling formula for the confidence band for the entire 
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regression line may be used so that there will be a known level of 
assurance that all estimates of the conditional means are correct (Neter 
& Wasserman, 1974, pp. 149-154) . Here the 1-d simultaneous confidence 
band would consist of the concatenation of the intervals each of which 
is bounded by 



D < x i> ± W& D(x) 



where 



w 2 = A n r 

2,n-<4 



Between X Within Case 



Now consider a repeated measures design that also involves a 
between-subjects factor as well as a covariate. Again we will for sim- 
plicity restrict the discrete factors to two levels each. The previ- 
ously cited example could easily be expanded for this case. Assume that 
children of various ages are randomly assigned either to a treatment 
condition (T) designed xo motivate them to do well in the upcoming task, 
or to a control condition (C) where the initial instruction is neutral. 
As before, all children's baseline and final performance are then 
observed on a problem solving task. 

Now one might view the problem as involving four regression lines 

Y T2 = ^2 + b T2 X + 6 T2 
Y T1 = % + b Tl X + V 
Y C2 = 3 C2 + b C2 X + 6 C2 

Y C1 = a ci + b ci x + e ci 
As before the numerical subscripts refer to levels of the within- 

subjects factor, and the letter subscripts now designate levels of the 

between-subjects factor. And, as before, it is convenient to work with 
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transformed scores to obtain the tests of interest. First consider the 
between-subjects effects. Letting Y s = Y l + Y 2' t * len t ^ ie ^tween- 
subjects effects could be assessed in the context of a model which 
allows for heterogeneous slopes: 



Y sr = Hi + b sr x + e , 



ST 



Y sc = a sc + b sc x + e sc 
where b^ = b T2 + t^; b gc = b Q1 + b c2? a^ = a g2 + a gl? and 

a SC = a Cl + a C2* R°g°Sct (1980) has fully developed such tests (these 

are illustrated belcw for the wi thin-effects) and the arguments for 

them, which include the fact that any heterogeneity of slopes across the 

levels of the between-subject factor results in the test statistic for 

the conventional ANOOVA being distributed as a non-central F. If the 

evidence for heterogeneity is sufficiently weak that you choose to 

assume it non-existent, then the typical ANCOVA using a pooled within 

group slope estimate could be used: 

Y ST " a ST + b S X + 6 ST 

Y SC = a sc + b s x + e sc 

The analysis of effects involving the wi thin-subjects factor would 
as before utilize difference scores. Two possible outcomes indicating 
different kinds of heterogeneity are possible here. First a significant 
regression of the difference scores on the covariate would indicate, as 
we discussed in our treatment of the totally within design, an ATI 
involving the covariate and the within subjects factor. In this event, 
we would suggest evaluating the effect of the within factor, not only at 
X but also at different points along the X dimension, as outlined above. 
The only difference would be in the degrees of freedom used to estimate 
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residual error variance and hence determine the critical values of the 
test statistics for the confidence intervals. 

Secondly, a difference across levels of the between-subjects factor 
in the slopes of the regression lines would indicate a three-way 
interaction involving the between factor, the within factor and the 
covariate. Not surprisingly, there are a variety of ways one could 
proceed to specify the locus of the three-way interaction. Perhaps most 
straightforward would be to use Rogosa's methods for testing the verti- 
cal difference between two regression lines, keeping in mind that the 
dependent variable for the analysis is itself a difference score across 
levels of the within-subject factor. Thus one would be examining the 
two wi thin-group regression lines: 

Y DC " *DZ + b DC X + \C (5) 
The difference between the sample regression lines at any point on X f 

D^), would here equal 

D < X i> = % " W + (6 DT " 6 DC )X (6) 
Non-simultaneous inference procedures would be used to assess the 

difference at a particular value of x, such as the grand mean on x. 

Here one would estimate the variance of the sampling distribution of 



D(x i ) by using the following expression: 

i , i , (x i- V 2 + ^i-^' 2 



" * 2 



"t n c S(x. - V 2 • S(x i " x c )2 . 



where s 2 is the pooled residual error variance estimate from the model 
in (5) . The critical value of a t test of D(x.) would now be based on 
ERIC N " 4 de 9 rees of freedom - A significant difference would be interpreted 
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to mean that for the subpopulation of people having that X score, the 
within-subject effect in the treatment condition would be reliably dif- 
ferent from the effect of the within factor in the control condition. 
Simultaneous inferences for a range of values of X could be made 
(Rogosa, 1980) by constructing confidence intervals centered at D(x i ) 
and bounded by 

D < X i> ± We D(x) (?) 
where 

The method just described for analyzing the three-way interaction 
essentially examined the simple two-way interaction of the between and 
within factors at particular levels of the covariate. Alternatively, 
one could look at simple effects within the levels of a different fac- 
tor . 

For example, one might proceed by examining the simple interaction 
of the within-factor and the covariate at each level of the between- 
subjects factor. This would involve tests made separately on the two 
equations in (5) , using the techniques for the totally within design 
separately for each. This would allow specification of the particular X 
values for which a significant effect of the within factor was observed 
for the treatment subjects, and a different set of X values for which 
the within effect was observed for controls. 

Finally, one might look at tests within levels of the repeated 
measures factor. This would imply analysis of the regression of the 
original dependent measures (instead of their sum or some other linear 
combination) on the covariate. Tests of heterogeneity of these regres- 
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sion lines across groups would then be performed separately for each 
repeated measure, and would be interpreted as simple interactions of the 
between subjects factor with the covariate at levels of the within sub- 
jects factor. The choice among these methods of analyzing the three-way 
interaction would be guided by considerations of what results are most 
inter pretable in the particular context of any study. 

Extensions 

Although we will not develop them here, the methods discussed in 
this paper can be extended to larger designs. With just two levels of 
the within-subjects factor, univariate tests similar to those we have 
discussed can be used with designs involving more between-subjects fac- 
tors. The true multivariate situation arises when there are more than 
two levels of the within-subject factor. In that case, methods utiliz- 
ing confidence intervals for predictions in multivariate multiple 
regression (see Finn, 1974, p. 121ff .) can be used to generate analogous 
procedures to those discussed here. 

Examples 

Table 1 contains two sets of hypothetical data that will be used to 
illustrate the procedures we have described. The data for Group C 
correspond to the example of the totally within design we discussed ini- 
tially, i.e. two problem solving scores, ^ and Y £ , are available for 
each of a set of children of different ages, X. The equivalent of a 
matched-pairs t-test comparing the means of Y ± and * 2 does not reach 
significance, P(l,9) = 3.807, MS e = 34.044, p > .05. However, when the 
test of this within-subjects factor is made more sensitive by including 

16 
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X-X^, as a covariate in the model, the adjusted test of the within effect 
is significant, F(l,8) = 7.834, MS g = 16.544, p < .025. The sample 
regression lines, which are shown in Figure 1 along with a scatterplot 
of the data, indicate the form of the ATI that results in the condi- 
tional test of the mean change being more powerful. The effect of the 
within-subjects factor in Group C is seen to decrease as age increases. 
Thus, further tests of the effect of the within-subjects factor at par- 
ticular points on X would be of interest and can be conducted by forming 
confidence intervals, as in equation (4) above, around the difference in 
- regression lines. These intervals are sketched in Figure 2 where the 
solid lines indicate the boundaries of the non-simultaneous confidence 
intervals and the dashed lines indicate the boundaries of the simultane- 
ous intervals. Using non-simultaneous intervals one would conclude a 
significant within effect at X values of 3, 4 and 5. Using simultaneous 
bounds, so that assertions can be made at a specified c( about the condi- 
tional means for all values of X, results in being able to conclude a 
significant within effect only at ages 3 and 4. 

We may now illustrate the between x within analysis by combining 
the data just analyzed with that labelled Group T in Table 1. Scatter- 
plots and regression lines for the Group T data are shown in Figure 3. 
The regressions of the difference scores on X for the two groups are 
shown in Figure 4. A standard ANOOVA of these difference scores would 
have resulted in less sensitive tests of the within effects tnan the 
unadjusted test, because there is no regression overall of the differ- 
ence scores on X, F(l,17) = 0, i.e. there is no overall interaction of 
the covariate and the within effect. However, this obtains because the 
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simple interactions of the oovariate with the within factor differ 
across levels of the between-groups factor, A test for heterogeneity of 
regression is highly significant, F(l,16) = 28.559, MS e = 11.699, p < 
.001, indicating there is a prominent three-way interaction. As we have 
indicated, a number of different approaches to further analyses are pos- 
sible in this situation. We illustrate here the method of examining the 
difference between the regression lines at different points on X. The 
difference between these regressions (of the form of equation (6) above) 
is indicated by the dashed line in Figure 4. Both simultaneous confi- 
dence intervals (cf. equation (7)) and non-simultaneous confidence 
intervals around this line would lead one to conclude here that the 
effect of the within factor is significantly greater in the treatment 
condition than the control condition for ages 5, 6 and 7. The siittple 
interaction of the between and within factors is non-significant for 
ages 3 and 4. Finally, one might wish to follow up these tests with 
still further analyses, e.g. of the "simple simple" effects of instruc- 
tions at particular ages within levels of the treatment factor. These 
could be carried out by constructing the appropriate confidence inter- 
vals around the regressions of the difference scores on X, as illus- 
trated in Figure 2. 

Conclusion 

We have discussed a method of analyzing repeated measures designs 
involving a oovariate. Essentially, the method involves viewing the 
effect of the within-subject factor as a linear function of the oovari- 
ate. We conclo.de that this approach to repeated measures designs per- 
mits not only more sensitive overall tests of the effect of the within- 
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subject factor but also a thorough analysis of the ATI implied by a sig- 
nificant regression of the within effect on the covariate. 
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Table 1 

Data, Summary Statistics, and Sample Regression Equations 





Group C 




X 


Y l 


Y 2 


Y D 


3 


10 


20 


10 


3 


12 


22 


10 


4 


21 


32 


11 


4 


25 


30 


5 


5 


30 


35 


5 


5 


36 


30 


-6 


6 


38 


40 


2 


6 


40 


43 


3 


7 


51 


51 


0 


7 


59 


55 


-4 





Group T 




X 


Y l 


Y 2 


Y D 


3 


10 


17 


7 


3 


13 


17 


4 


4 


16 


23 


7 


5 


20 


25 


5 


5 


28 


39 


11 


5 


29 


36 


7 


6 


40 


52 


12 


7 


37 


50 


13 


7 


53 


70 


17 


7 


62 


80 


18 


5.2 


30.8 


40.9 


10 


1.5 


17.3 


21.9 


4 



Mean 5.0 32.6 35.8 3.6 
S.D. 1.5 15.9 11.5 5.8 

Regression Equation SSgrror Regression Equation SSgrror 
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Figure 3. Regression of Yi and Y2 on X in Group T. 



